Understanding Scoring Through Examples

Rudi Seitz • Location: Theater 4 • Back to Haystack 2020

The built-in scoring mechanism in Elasticsearch and Solr can seem mysterious to beginners and experienced practitioners alike. Instead of delving into the mathematical definitions of TFxIDF and BM25, this talk will help you develop an intuitive understanding of these metrics by walking you through a series of simple examples. Each example consists of a query and list of several indexed documents. You will be invited to guess which document comes up on top for each query. In each case, we will examine why that particular document gets the highest score and we’ll extract the general principle behind this behavior. A set of six examples will be followed by an “extra credit” section focusing on more advanced topics. Along with illustrating all of the key behaviors of BM25, our examples will touch on some of the “gotchas” around scoring in cluster scenario, where shards and replicas come into play. The talk aims to teach you, in a short time and without any math, everything you’ll ever need to know about scoring. Having a solid understanding of scoring will prepare you to better diagnose relevance problems and improve relevance in real-world applications.

Rudi Seitz

KMW